Skip to content

[AMD][TRITON_KERNELS] Improve matmul ogs config on RDNA#931

Merged
leeliu103 merged 1 commit intorelease/internal/3.6.xfrom
backport-rdna-matmul-config-3.6
Mar 17, 2026
Merged

[AMD][TRITON_KERNELS] Improve matmul ogs config on RDNA#931
leeliu103 merged 1 commit intorelease/internal/3.6.xfrom
backport-rdna-matmul-config-3.6

Conversation

@leeliu103
Copy link
Copy Markdown

@leeliu103 leeliu103 commented Mar 12, 2026

The default configuration for matmul_ogs resulted in significant register spilling on RDNA when the batch size exceeded 512.

With this optimization, we observe a notable improvement in GPT-OSS end-to-end performance.

gpt-oss-20B on 1x GPU

Device Optimization Throughput (reqs/sec) Improvement
Navi31-48GB Baseline 3.84 100%
Navi31-48GB Config Optimization 6.18 161%
Navi48-32GB Baseline 8.58 100%
Navi48-32GB Config Optimization 10.07 117%

The default configuration for matmul_ogs resulted in significant register spilling
on RDNA when the batch size exceeded 512.

With this optimization, we observe a notable improvement in GPT-OSS end-to-end performance.

gpt-oss-20B on 1x GPU
| Device      | Optimization        | Throughput (reqs/sec) | Improvement |
| ----------- | --------------------| --------------------- | ----------- |
| Navi31-48GB | Baseline            |                  3.84 |        100% |
| Navi31-48GB | Config Optimization |                  6.18 |        161% |
| Navi48-32GB | Baseline            |                  8.58 |        100% |
| Navi48-32GB | Config Optimization |                 10.07 |        117% |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@leeliu103 leeliu103 force-pushed the backport-rdna-matmul-config-3.6 branch from e76303e to fd25401 Compare March 13, 2026 19:53
@leeliu103 leeliu103 requested a review from antiagainst March 13, 2026 20:12
@leeliu103 leeliu103 merged commit b317896 into release/internal/3.6.x Mar 17, 2026
2 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant